Troy Mazerolle

Student Number 8972394

Practical Lab 2 - Data Visualization and Publication¶

In [ ]:
# Utility Libraries
import numpy as np
import pandas as pd

# Graphing Libraries
import plotly
import plotly.graph_objects as go
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sb

plotly.offline.init_notebook_mode()

Plot 1: Stock Prices using plotly¶

For my first graph I want to demonstrate how to use plotly to plot the historical prices of a stock using a candlestick chart. I chose to do this because one of my goals for this program is to learn how to apply machine learning methods to financial markets.

For this demonstration, I will be using the historical price data of Unity, which is a software used for game development. I chose Unity simply because I am currently learning how to use Unity.

In [ ]:
unity = pd.read_csv("U.csv")

fig = go.Figure(data = [go.Candlestick(x = unity['Date'],
                                       open = unity['Open'],
                                       high = unity['High'],
                                       low = unity['Low'],
                                       close = unity['Close'])])

fig.update_layout(
    title = "Unity Stock Prices",
    yaxis_title = "Price",
    xaxis_title = "Date",
    xaxis_rangeslider_visible = False
)
fig.show()

A candlestick can be read as follows:

  • The tops and bottoms of each stick represent the highest and lowest price that the stock reached during that period. Since this is a daily time frame, each top and bottom represents the highest and lowest price the stock moved that day.
  • The bodies of each candlestick are a bit more dynamic. Each end of the bodies represents the open price or the close price of the stock during that period. Since this is a daily time frame, the ends of the bodies represent the price that the stock started the day with or the price that the stock finished the day with.
    • If the candlestick is green, that means that the bottom of the body is the opening price, and the top of the body is the closing price. The candlestick being green means that the stock ended the day at a higher price than the stock started the day with.
    • If the candlestick is red, that means that the top of the bosy is the opening price, and the bottom of the body is the closing price. The candlestick being red means that the stock ended the day at a lower price than the stock started the day with.

Plot 2: Plotting Relative Strength Index with matplotlib¶

The Relative Strength Index (RSI) is an indicator that measures how overbought or oversold a stock is. The formula for RSI is $RSI = 100 - [\frac{100}{1 + \frac{n_{up}}{n_{down}}}]$, where $n_{up}$ and $n_{down}$ are the average gains and average losses respectively over the period. In this case we are using a period of 10, which means that the RSI at date X is calculated using the prices between 10 days before X and X. A general rule-of-thumb is that when the RSI is above 70, the stock is considered overbought, and it is a signal that the stock price will start going down. Conversely, when the RSI is below 30, the stock is considered oversold, and it is a signla that the stock price will start increasing.

First, we need to write a function that takes in the data of the stock and the reference period, and outputs the RSI values:

In [ ]:
def rsi(stockData, period):

    open = stockData['Open']
    close = stockData['Close']
    returns = (close - open)/open

    intialReturns = returns[0:period]
    avgGain = sum(intialReturns[intialReturns >= 0])
    avgLoss = abs(sum(intialReturns[intialReturns < 0]))
    rsIndexes = np.array([])

    for i in range(period, len(returns) - 1):
        intialReturns = returns[(i - period):i]
        avgGain = sum(intialReturns[intialReturns >= 0])
        avgLoss = abs(sum(intialReturns[intialReturns < 0]))
        rsIndexes = np.append(rsIndexes, 100 - (100 / (1 + avgGain / avgLoss)))
    
    return rsIndexes

rsiValues = rsi(unity, 10)

Using matplotlib, we can now plot the relative strength index of Unity stock. We will also add horizontal lines at $RSI = 30$ and $RSI = 70$ to indicate when the stock might be oversold or overbought.

In [ ]:
dates = unity['Date'][(len(unity['Date']) - len(rsiValues)):len(unity['Date'])] # Getting the dates that correspond with the values in RSI

fig, ax = plt.subplots()  
ax.plot(dates, rsiValues)
plt.axhline(y = 70, color = 'g', linestyle = '--', linewidth = 2)
plt.axhline(y = 30, color = 'r', linestyle = '--', linewidth = 2) 
plt.title("Relative Strength Index of Unity")
plt.xlabel('Date') 
plt.ylabel('RSI') 
Out[ ]:
Text(0, 0.5, 'RSI')

The dates are difficult to read, but we can see that the stock was oversold towards the middle. The stock is also currently overbought, which suggests that the price might go down.

Plot 3: Analysis of Returns with seaborn¶

For my next chart, I want to compare the total gains and total losses between Microsoft and AMD. I chose these two stocks because Microsoft is considered to be a low-risk stock, while AMD is considered to be a high-risk stock. Using a bar chart, I want to visually show the differences between profits and losses.

In order to plot this efficiently, we need to organize the data so that the barplot function can read everything easily. To do this, we will:

  • Load in the data for Microsoft and AMD
  • Calculate the returns for each stock and put the results into their own data frame
  • Add a column that indicates whether the price corresponds to MSFT or AMD
    • The barplot function will use this column to determine how to group the bars
  • Add a column that indicates whether the stock price was positive or negative for that observation
    • The barplot function will use this column to determine how to split the grouped bars
  • Mutate the returns so that they are all positive
    • This is just so the "POSITIVE" and "NEGATIVE" bars still go in the same direction
In [ ]:
microsoft = pd.read_csv("MSFT.csv")
amd = pd.read_csv("AMD.csv")

microsoftReturns = (microsoft['Close'] - microsoft['Open']) / microsoft['Open']
amdReturns = (amd['Close'] - amd['Open']) / amd['Open']

microsoftDF = pd.DataFrame(data = {'Symbol': ['MSFT'] * len(microsoftReturns),
                                   'Return': microsoftReturns})
amdDF = pd.DataFrame(data = {'Symbol': ['AMD'] * len(amdReturns),
                             'Return': amdReturns})

stockData = pd.concat([microsoftDF, amdDF], axis = 0)
stockData = stockData.reset_index(drop = True)

returnDirection = [''] * len(stockData)
for i in range(len(stockData)):
    returnDirection[i] = "POSITIVE" if stockData['Return'][i] >= 0 else "NEGATIVE"

stockData['Direction'] = returnDirection
stockData['Return'] = abs(stockData['Return'])
print(stockData)
    Symbol    Return Direction
0     MSFT  0.001965  POSITIVE
1     MSFT  0.008455  POSITIVE
2     MSFT  0.001570  POSITIVE
3     MSFT  0.021779  NEGATIVE
4     MSFT  0.013074  POSITIVE
..     ...       ...       ...
497    AMD  0.018636  NEGATIVE
498    AMD  0.010653  POSITIVE
499    AMD  0.024834  POSITIVE
500    AMD  0.016601  NEGATIVE
501    AMD  0.043179  NEGATIVE

[502 rows x 3 columns]

Now that we have all the columns we need, we move on to graphing the returns.

In [ ]:
sb.barplot(data = stockData, x = "Symbol", y = "Return", hue = "Direction", estimator = sum, errorbar = None)
Out[ ]:
<Axes: xlabel='Symbol', ylabel='Return'>

From the bar graph, we can see that both MSFT and AMD are generally profitable. While AMD has significantly higher total profits, it also has significantly higher total losses. This is why a stock like Microsoft is generally considered low-risk, while a stock like AMD is generally considered high-risk. While both stocks appear to have about the same net-profit (AMD appears to be slightly higher), there is less volatility in Microsoft.

Plot 4: Plotting Functions¶

This last plot will not be a finance plot. However, I did want to practice plotting various functions and lines-of-best-fit, especially since we will be studying linear regression shortly.

We will start by generating some data. We will make a line with equation $y = 3x + 4$, and generate some error in the values with $mean = 0$ and $\sigma = 50$. We will then plot the scatterplot and overlay the actual line over the plot.

In [ ]:
func = lambda x: 3 * x + 4

xvals = np.arange(0, 101, 1)

mean = 0
std = 50
yvals = func(xvals) + np.random.normal(mean, std, 101)

fig, lineplot = plt.subplots()  
lineplot.plot(xvals, yvals, '.')
lineplot.plot(xvals, func(xvals))
Out[ ]:
[<matplotlib.lines.Line2D at 0x21af3af56d0>]

The above structure can be used to display linear regression models. However in regression, instead of generating data based on a predetermined slope and y-intercept, we would be given the data and have to mathematically solve for the optimal slope and y-intercept.